An Efficient Method for Making Un-Supervised Adaptation of HMM-based Speech Recognition Systems Robust Against Out-Of-Domain Data
نویسندگان
چکیده
Major aspects of cognitive science are based on natural language processing utilizing automatic speech recognition (ASR) systems in scenarios of human-computer interaction. In order to improve the accuracy of related HMM-based ASR systems efficient approaches for un-supervised adaptation represent the methodology of choice. The recognition accuracy of speaker-specific recognition systems derived by online acoustic adaptation directly depends on the quality of the adaptation data actually used. It drops significantly if sample data outof-scope (lexicon, acoustic conditions) of the original recognizer generating the necessary annotation is exploited without further analysis. In this paper we present an approach for fast and robust MLLR adaptation based on a rejection model which rapidly evaluates an alternative to existing confidence measures, so-called log-odd scores. These measures are computed as ratio of scores obtained from acoustic model evaluation to those produced by some reasonable background model. By means of log-odd scores threshold based detection and rejection of improper adaptation samples, i.e. out-of-domain data, is realized. By means of experimental evaluations on two challenging tasks we demonstrate the effectiveness of the proposed approach.
منابع مشابه
A New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain
Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...
متن کاملAlert correlation and prediction using data mining and HMM
Intrusion Detection Systems (IDSs) are security tools widely used in computer networks. While they seem to be promising technologies, they pose some serious drawbacks: When utilized in large and high traffic networks, IDSs generate high volumes of low-level alerts which are hardly manageable. Accordingly, there emerged a recent track of security research, focused on alert correlation, which ext...
متن کاملتخمین سریع ضرایب پیچش در هنجارسازی طول مجرای صوتی با استفاده از امتیاز به دست آمده از مدلسازی تشخیص جنسیت
The performance of automatic speech recognition (ASR) systems is adversely affected by the variations in speakers, audio channels and environmental conditions. Making these systems robust to these variations is still a big challenge. One of the main sources of variations in the speakers is the differences between their Vocal Tract Length (VTL). Vocal Tract Length Normalization (VTLN) is an effe...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کامل